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Abstract 

Consider a set of order statistics that arise from sorting samples from two different 
populations, each with their own, possibly different distribution function. The 
probability that these order statistics fall in disjoint, ordered intervals, and that of 
the smallest statistics, a certain number come from the first populations, are given 
in terms of the two distribution functions. The result is applied to computing the 
joint probability of the number of rejections and the number of false rejections for the 
Benjamini-Hochberg false discovery rate procedure. 
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order statistics fall in disjoint, ordered intervals on the set of real numbers. In this paper, 
we extend this work and consider two sets of real valued, independent but not necessarily 
identically distributed random variables. We give expressions in terms of cumulative 
distribution functions for the probability that arbitrary subsets of order statistics fall in 
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disjoint, ordered intervals, and that of the smallest statistics, a certain number come from 
one set. We have been unable to find any previous papers on this topic. This problem 
is of interest in calculating probabilities for the Benjamini and Hochberg (1995) multiple 
comparisons procedure. 



2 A simple example 

Consider the following simple example. Let X\, X 2 G [0, 1] be independent random variables. 
Denote by Fx 1 (x\) and Fx 2 (x 2 ) the marginal cumulative distribution functions and by 
Fx 1 ,x 2 {xi,x 2 ) the joint cumulative distribution function of X\ and X 2 . Assume that 
the cumulative distribution functions are continuous. Let Y\ = min{Xi,X2} and let 
Y 2 = m.ax{Xi,X 2 } be the order statistics. For i — 1,2, write the marginal cumulative 
distribution function of Yj as Fy t (y^), and the joint cumulative distribution function as 
Fy 1} y 2 (2/1,2/2), for yi < y 2 . This joint cumulative distribution function is also continuous 
(David, 1981, p. 10). 

Choose numbers 61 < b 2 , bi, b 2 e (0, 1). We wish to find the probabilities 

A = Pr{(y 1 <b 1 )A(y 2 >b 2 )}, (1) 

/3 = Pr {( Vl < h) A (y 2 > b 2 ) A {x x < h)} (2) 

and 

7 = Pr {{ Vl < b x ) A {y 2 > b 2 ) A - {x t < h)} . (3) 

and express them in terms of the distribution functions Fx 1 and Fx 2 - First, we will find the 
probabilities directly. So, 

(3 = Pr{(xi < h) A (x 2 > b 2 )} 
= F Xl (h) [1 - F X2 (b 2 )) (4) 

and 

7 = Pr{(x x > b 2 ) A (x 2 < 61)} 
= [l-F Xl (b2)]F X2 (b 1 ). (5) 

Equations (j4j) and ([5]) follow directly from the independence of the random variables, and 
the definition of the cumulative distribution functions. Since 

{(yi < 61) A {y 2 > b 2 )} = (6) 
{(j/i < 61) A (y 2 > b 2 ) A (xi < 61)} U {{yi < b x ) A (y 2 > b 2 ) A -. {x x < 61)} 

and the union is disjoint, it follows that 

A = (3 + 1 . (7) 

For a problem with more than two order statistics, the number of cases one needs to 
consider and the number of possible combinations of statistics, subsets, and bounds makes 
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a direct approach impractical. An algorithmic approach to obtaining 7 and f3 will allow the 
generalization to an arbitrary number of order statistics. 

Using the assumption that the distribution functions are continuous, simple set 
operations, and the definition of distribution function, we obtain that the probability of 
the union (jSJ) is 

A = Pr{(y 1 <b 1 )A^(y 2 <b 2 )} (8) 
= Pr { Vl < h} - Pr {( Vl < b x ) A (y 2 < b 2 )} (9) 
= F Yl (b 1 )-F YlX2 (b 1 ,b 2 ). (10) 

The cumulative distributions of the order statistics can be written (Bapat and Beg, 1989), 

F Yl (h) = F Xl (h) [1 - F X2 {b x )\ + [1 - F Xl (60] F X2 (h) (11) 

F Yl ,y 2 (61, b 2 ) = F Xl (60 [F X2 (63) + F X2 (h)} - [F Xl (b 2 ) - F Xl (61)] F X2 (h) . (12) 

Then, substituting Equations ffTTj) and ffl2l into Equation (ITUl) . we can write A in terms of 
the distribution functions of X\ and X 2 , 

A =F Xl (60 [1 - F X2 (h)} + [1 - F Xl (h)) F X2 (60 (13) 
- F Xl (60 [F X2 (60 - F X2 (60] - (62) - F Xl (60] F x , 2 (60 
=F Xl (60 [1 - F X2 (62)] + [1 - F Xl (6 2 )] F X2 (60 • (14) 

We now interpret the terms in the sum in Equation (fl4|) . The term that includes F Xl (61) 
as a factor is the probability of an event in which x\ < 61 occurs, and the term that includes 
1 — F Xl (6 2 ) as a factor is the probability of an event in which x\ > b 2 . Since 61 < 6 2 , the 
two events are disjoint, and, consequently, ([7]) follows again. 

To summarize, we have expressed the probability in terms of the joint distribution of 
the order statistics, which was in turn written in terms of the distribution functions of 
the random variables. Finally, by recognizing terms that corresponded to a partition, we 
decomposed A into a sum of f3 and 7, the two probabilities of interest. 



3 General case 

The logic used in this simple, two random variables example can be generalized to an 
arbitrary number of random variables. Consider a set of order statistics that arise from 
sorting samples from two different populations, each with their own, possibly different 
distribution function. We wish to find the probability that these order statistics fall in 
a given union of intervals, and that of the smallest statistics, a certain number come from 
one population. 

For this general case, we need to introduce some notation and definitions. Let Xj, 
i = 1, . . .m, be independent but not necessarily identically distributed real valued random 
variables with values in the interval [0, 1] and continuous cumulative distribution functions 
F x . (xi). Partition the set {X 1? X 2 , . . . , X m } into two subsets, 

Si = {Xi, X 2 , . . . , X n } , S 2 = {X n+ i, X n+2 , . . . , X m } . (15) 
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For example, one can consider measurements for males or females, or for two different 
populations of breast cancer, slow or fast growing. The order statistics Yi,Y 2 , . . . ,Y m are 
random variables defined by sorting the values of Xj. Thus Y\ < Y 2 < . . . < Y m . Denote the 
realizations of the order statistics by y% < y 2 < . . . < y m . 

The arguments of the joint cumulative distribution function of order statistics are 
customarily written omitting redundant arguments; thus for 1 < e < m let 1 < n\ < n 2 < 
■ ■ ■ < n e < m, denote the indices of the order statistics of interest. The joint cumulative 
distribution function of the set [Y ni , Y n2 , . . . , Y ne }, which is a subset of the complete set of 
order statistics, is defined as 

F Yni ,...Y ne (l/i, ■ ■ ■ , Ve) = Pr ({Y m < Vl } n {Y n2 <y 2 }n---n {Y ne < y e }) . (16) 

Suppose we are given s <m disjoint intervals 

(c q , dg), = ci < di < c 2 < • • • < c s < d s = 1, (17) 

and integers 

s 

k q >0, ^2k q = m, (18) 

9=1 

where ko = and k q is the number of order statistics that fall in the q th interval. Define 
w q< i = 1 + ^ii an< ^ w g,k q — YH=i ki to be the subscripts of the largest and smallest 

order statistics, respectively, that fall in the q th interval. In the case when k q = 1, we have 
Wg^i = w q ^k q - Using this notation, the event that exactly k q of the order statistics fall in the 
q th interval is 

jci < Y W1A <■ ■■ < Y WlM < d x A • • • A c s < Y Ws>1 <■■■ < Y Wsks < 4} , (19) 

or, in a more compact notation (12T|) below. Now let B be another random event. The 
following theorem gives the probability of this event intersected with the event ( Fl9|) . in 
terms of the cumulative distribution functions of the order statistics relative to the event B. 
This distrubution function is defined by 

F Yni ,...Y ne ;B{yi,...,y e ) (20) 
= Pr ({Kh < yx} n {Y n2 <y 2 }n---n {Y ne < Ve } n B) 

Contrary to the usual convention, we do not require that the indices of the order statistics 
in the cumulative distribution function (120]) are sorted, because that would result in a 
complication of the notation in the next theorem (additional renumbering of the arguments). 

Theorem 1 Denote the event 

s 

E = P| ({c, < Y WqA } n {Y Wqkq < d q }) . (21) 

9=1 
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Then 

Pr (F n 5) =F Ywi ki >Yw2M _ Yws ka , B (di, d 2 , . . . , d q ) 

s 

~ E Fy ^ M > Y ™2 M >-,Y Wa>ks ,Y Wiihi ;B {dl, d 2 ,..., dg, C q ) 



8=1 
S 



+ E Fy ^ M ' Y ^'---' Yw sM' Y ^ Y ^^ rf 2, • • • , d q , Cr, Ct) 



r,t=l 
r<t 



+ ("I)" >^2,i^2, fe2 .-.^ s> i.^ s , fcs ;B dl, 02, d 2 , 

Proof. By standard set operations, 



<j=l ? =1 

and 

s s / s \ C 

n k < w = n ^ c 4 c = u ^ c 4 

g=l 5=1 \?=1 

where c denotes the complement. Therefore, 

EHB= ^{Y Wql <c q ]j nF, 
where the event F is defined by 

s 

F = f]{ Y ^ q <d *} nB - 

9=1 

By the additivity of probability, it follows from (I25I) that 



Pr (E n 5) = Pr (F) - Pr ( (J < c,} n F J = Pr (F) - Pr I (J A, 

\q=l / \ ? =1 / 

where A g = {Xo g ,i < c g } fl F. Using the additivity of probability again, we have 



Pr ( IM = E Pr w - E Pr n + ■ 

^<j=l / g=l r,t=l 

r<t 

+ (-i) s Pr (ha, 



< di 



> dt 



Si 

s 2 

Total 



j 


n-j 


kl - j 


(m-n) - (£4 - j) 



m — ki 



n 

(m — n) 
m 



Table 1: Numbers of order statistics from the sets Si and S2 in the interval (0, d{) and 
outside the interval (0, di), in the event B. 

Now putting fl26|) - ff29]) together and using the continuity of the cumulative distribution 
functions, we obtain 



Pr(EnB) = Pr(f){Y Wqkq <d q }nB 



s s > 

- E Pr { y »M < n fl [v Wqkq <d q }nB 



r=l \ q=l 



s / s 

+ E Pr < ct} n {Y WtA < ct} n f| {Y Wqkq <d q )r\B 



r,t=l \ q=l 
r<t 



'l,fc 1 ^-2, fc2 -- y - S , fes - y -r,l- y »i,l ;S(Cldl ' d2 '-' d9 ' C ''' Ct) 



+ Pr ( f| {Y Wql < c q } n f| < n 2? j , 

\q=l g=l / 

V v ' 

FY ^l,l^ l!kl Xw 2$1 ,Y W2k2 ,...Xw sA ,Y Wsks M c ^^ 

which concludes the proof. 
■ 

From now on assume that B is the event that exactly j elements of Si fall in the interval 
(0,|/i), for a given j < n. This event is shown in Table [TJ Thus, to compute the probability 
of interest, it is enough evaluate the cumulative distribution functions relative to the event B 
of the order statistic, given by (I20I) . An efficient method for the computation of cumulative 
distribution functions of order statistics from two populations was proposed by Glueck et 
al. (2007). Here we need a slight generalization, involving the event B, which requires a 
different proof. 

Theorem 2 Denote the index vector i = (io, ii, ■ ■ ■ i e +i) an d the summation index set 

1 =io d~ >-"f-^f< +i <r }• (3°) 

I and % a > n a for all 1 < a < k I 
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Suppose that F Xl (x) — F (x), for all 1 < i < n, and F Xi (x) = G (x), for all n + 1 < i < m. 
Then the cumulative distribution function relative to the event B (HZ|) is given by 

F Yni ,...Y ne ;B(yi,---,ye) (31) 

y-> n\ (m — n)\ 



■ [F (y a ) - F (y a _i)] x ° [G (y a ) - G (y^)]"- 1 ^ , 

where yo = 0, y e+ i = 1, and X = (Ai, A2, . . . , A e +i) ranges over all integer vectors such that 
Xi = j and 

Ai + A 2 H h A e+ i = n, < \ a < i a - ia-i- (32) 

Proof. Denote by A\ \ the event that exactly i a — i a -\ of the random variables Xi fall 
in the interval (y a -i,y a ), and exactly A a of those are elements of Si. When a = 1, 
(y a -i,y a ) = (yo,yi) = (0,yi). If B occurs, Ai = j. Then from the binomial theorem, 



Pr (A a) = ft x.r 1 ^ n) \v l F (Va) - F (Va-i)f a [G {ya) ~ G (y a -i)]"- ; 1 V 

, A a ! U a — l a -\ — A a )\ 
a=l v ' 

(33) 

Since the events A- lt \ for different (i, A) are disjoint, the result follows. ■ 

The only difference between Theorem |2] and the result by Glueck et al. (2007) is the 

added condition Ai = j. 

In the case of two random variables, we recover the same results as the direct method 

in Section [2j With m = 2, n = 1, s = 2, ci = 0, di — bi, C2 = 62, ^2 = 1, Si = {-^1}, 

S 2 = {Xi}, h = 1, k 2 = 1, Y W1>1 = Y WlM = Yi, Y W2A = Y W2 ki = Y 2 , using Theorem Q] and M 

yields 

(EH B) = 7 , (34) 

when j = 0, and 

(E n B) = p, (35) 

when j = 1. 

In conclusion, for two sets of real valued, independent but not necessarily identically 
distributed random variables, we have now given an expression for the probability that 
arbitrary subsets of order statistics fall in disjoint, ordered intervals, and that of the smallest 
statistics, a certain number come from one set. 



4 Concluding example 

The methods of this paper can be used to calculating the joint probability of the number of 
rejections and the number of false rejection for the Benjamini-Hochberg (1995) procedure. 
A rejection of a hypothesis for which the null holds is a false rejection. Given an false 
discovery rate a 6 (0, 1), hypotheses Hi i = 1, . . . , m, p- values X^ and the corresponding 
order statistics for the p-values Yi = Xu) (the random variables Xi sorted in nondecreasing 
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order X^ < X( 2 ) < • • • < X( m )), the procedure produces a nondecreasing sequence of 
numbers 6j = ia/m e (0, 1), rejects the hypotheses i?( e ), e = 1, . . . , ki, such that fci is the 
largest number for which < b^, and accepts all others. For n£ {0,1,..., m} assume that 
the null holds for Hi, H 2 , . . . , H n and that the alternative holds for H n+ i, H n+2 , . . . , H m . Let 

51 = {Xi,X 2 , . . . ,X n } be the set of p-values that correspond to the null hypotheses, and 

5 2 = {X n+ i, X n+2 , . . . ,X m } be the set of p-values for which the alternative holds. Then j 
is the number of null hypotheses that are rejected, which is equal to the number of p-values 
corresponding to null hypotheses that fall in the interval [0,6feJ. 

Under the assumption that the p-values for which the alternative holds have the same 
distribution, one can use the methods of this paper to find the joint distribution of j and ki. 
For each value of ki and m, Glueck et al. (2006a) pointed out that the rejection regions for 
the Benjamini and Hochberg (1995) procedure can be decomposed into disjoint sets of events. 
These events correspond to certain numbers of order statistics falling into sets of intervals, 
defined by the numbers &«. Details about the decomposition of the rejection regions into 
these events are given in Glueck et al. (2006a). The general case is too complicated to detail 
here. However, as an example, we calculate the probabilities that with m = 2 hypotheses, 
and n = 1 null hypotheses, the Benjamini and Hochberg (1995) procedure rejects ki — 1 
hypotheses, and that j, the number of false rejections, is either or 1. 

Suppose we wish to test m = 2 hypotheses. Specifically, we wish to test hypotheses 
about the location of the sample mean. We plan to conduct a two sided test. We assume 
that we have two large populations, with known variances (both a 2 ), and that the variables of 
interest, say ei and e 2 , are normally distributed, so that ei ~ N (/ii, a 2 ) and e 2 ~ iV (/i 2 , a 2 ). 
We wish to test two hypotheses Hi : fii = fx , and H 2 : fi 2 = fi , with the alternative 
hypothesis for both populations the same, so Ha : /j, = /j,jy. We sample JVj random variables 
from each population, say en, . . . e^. For convenience, we will assume that the random 
sample is of the same size for each hypothesis test, so iVi = iV2 = N. 

With 

N 

e i = N- 1 Y,eiS, (36) 



the test statistics are given by 



a 



5=1 



-i 

- Mo) , (37) 



'N 

and the two sided p-values are (Rosner, p. 244, 2006) 

/2*(Z0 Z,<0 

where $ is the cumulative distribution function of the standard normal (mean = and 
variance = 1). Let be the probability density function of the standard normal. 
Suppose that in truth, we have ei ~ N(fi ,a 2 ), so that the null holds for Hi, and 
e 2 ~ N(fi A ,a 2 ), so that alternative holds for H 2 . Define Si = {Xi}, and S 2 = {X 2 }. 
Then the number of p- value for which the null holds, n — 1. For Hi, the hypotheses for 
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h 


3 


Theory 


Simulation 


Difference 


l 





.472982 


.47388 


.000898 


l 


1 


.00978051 


.0095 


.00028051 



Table 2: Comparison of Simulation and Theory. Recall that k\ is the number of hypotheses 
that were rejected, and j is the number of null hypotheses that were rejected. We had two 
hypotheses, and one null hypothesis. 



which the null holds, the p- value has a uniform distribution on the interval [0,1], so for 
x x e [0,1], 

F Xl (xi) = X\. (39) 

For H 2 , the alternative holds. When we conduct the hypothesis test, we are unaware of 
the truth. We always calculate the p-value under the null. However, since the alternative 
actually holds, 

Q — A^o 



Pr [Z 2 < z 2 \ =Pr 



Pr 



Vn 
e« — Ha 



(40) 



<z 2 + 



z 2 + 



Vn 

Ho — HA 



Ho — HA 

a 

Vn 



Vn 



Finally, 



Fx 2 (x 2 ) 



Pr (X 2 < x 2 ) 

Pr ({X 2 < x 2 } n {Z 2 < 0}) + Pr ({X 2 < x 2 } n {Z 2 > 0}) 

Pr ({2$ (Z 2 ) < x 2 }) + Pr ({2 [1 - $ {Z 2 )\ < x 2 }) 

Pr ({Z 2 < (x 2 /2)}) + 1 - Pr ({Z 2 < ^ l (1 - x 2 /2)}) 



(41) 



Q- 1 (x 2 /2) + 



Ho — Ha 

Vn 



+ 1 - $ 



$- 1 (l-x 2 /2) + 



A^o ~ Ha 

Vn 



where the last step follows by substitution from Equation H0J 

Now, as a specific example, we fix /x = 0, fiA = 1, cr 2 = 1 a = .05. We wish to calculate 
the probability that k\ = 1, and that j = or j = 1. With c\ = 0, d\ = a/2, c 2 = a, 
d 2 = 1. This is the probability that of the two hypotheses, we reject exactly one, and it is 
Hi, the hypothesis for which the null holds. When j = 0, the rejection we make is of the 
hypothesis for which the alternative holds, and when j = 1, the rejection we make is of the 
null hypothesis, a false rejection. 

We calculated the probability using our methodology, and by a simulation using a sample 
of 100,000 variables. Recall that k\ is the number of order statistics that are less than b\, 
and j are the number in Set 1, and less than b%. The results are shown in Table [3 
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Notice that the simulation differs from the theory only in the fourth decimal place. The 
theory is exact. Software that implements this method in Mathematica is available from the 
authors upon request. 
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